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Abstract — A generalization of the problem of writing on dirty 
paper is considered in which one transmitter sends a common 
message to multiple receivers. Each receiver experiences on its 
link an additive interference (in addition to the additive noise), 
which is known noncausally to the transmitter but not to any 
of the receivers. Applications range from wireless multi-antenna 
multicasting to robust dirty paper coding. 

We develop results for memoryless channels in Gaussian 
and binary special cases. In most cases, we observe that the 
availability of side information at the transmitter increases 
capacity relative to systems without such side information, and 
that the lack of side information at the receivers decreases 
capacity relative to systems with such side information. 

For the noiseless binary case, we establish the capacity when 
there are two receivers. When there are many receivers, we show 
that the transmitter side information provides a vanishingly small 
benefit. When the interference is large and independent across 
the users, we show that time sharing is optimal. 

For the Gaussian case we present a coding scheme and 
establish its optimality in the high signal-to-interference-plus- 
noise limit when there are two receivers. When the interference 
is large and independent across users we show that time-sharing 
is again optimal. Connections to the problem of robust dirty 
paper coding are also discussed. 



I. Introduction 

The study of communication over channels controlled by 
a random state parameter known only to the transmitter was 
initiated by Shannon [21]. Shannon considered the case where 
the state sequence is known causally at the encoder Subse- 
quently, Gel'fand and Pinsker [10] analyzed the case where the 
state sequence is available noncausally. The noncausal model 
has found application in diverse areas, ranging from coding for 
memory with defects [12], [18], to digital watermarking [3], 
[4], [20], and to coding for the multiple-input/multiple-output 
(MIMO) broadcast channel [1], [25]. 

Costa [6] considered a version of the Gel'fand-Pinsker 
model in which there is an additive white Gaussian inter- 
ference ("dirt"), which constitutes the state, in addition to 
independent additive white Gaussian noise. The key result in 
this "dirty paper coding" scenario is that there is no loss in 
capacity if the interference is known only to the transmitter 
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By contrast, there has been very limited work to date 
on multiuser channels with state parameters known to the 
transmitter(s). In an early work in this area, Gel'fand and 
Pinsker [11] show that the Gaussian broadcast channel with 
independent messages incurs no loss in capacity if the inter- 
ference sequences are known noncausally to the transmitter 
Some other multiuser settings are also discussed. The degraded 
broadcast channel with independent messages and state se- 
quence known to the transmitter either causally or non-causally 
is examined in [23]. Other works on multiuser channels with 
state parameters include [17], [2], [16], [13] and [22]. 

This paper examines the common-message broadcast chan- 
nel, which we refer to as the multicast channel. Specifically, 
we consider a scenario in which one transmitter broadcasts 
a common message to multiple receivers. In addition to 
additive noise, associated with the link to each receiver is 
a corresponding additive interference. The collection of such 
interferences is thus the (random) state of the multiuser 
channel. In our model, the transmitter has perfect noncausal 
knowledge of all these interference sequences, but none of 
the receivers have knowledge of any of them. This model and 
its generalizations arise in a variety of multi-antenna wireless 
multicasting problems as well as in applications of robust dirty 
paper coding where only imperfect knowledge of the state is 
available to the transmitter 

The capacity of some binary versions of such multicast 
channels is reported in [14], [15]. For more general channels, 
[24] reports achievable rates for broadcasting common and 
independent messages over a discrete memoryless channel 
with noncausal state knowledge at the transmitter The case 
of two-user Gaussian channels with jointly and individually 
independent identically distributed (i.i.d.) Gaussian interfer- 
ences on each link is also considered in [24], for which it is 
conjectured that in the limit of large interference, time-sharing 
between the two receivers is optimum even when both are 
only interested in a common message. Among other results, 
in this paper we establish that this conjecture is true. We upper 
bound the capacity of the Gaussian channel and show that it 
approaches the time-sharing rate in this limit. In addition, we 
also present a coding scheme that is asymptotically optimal in 

the limit of high signal-to-interference-plus-noise (SINR) ratio 
1 

An outline of the paper is as follows. Section HH presents 
the general multicast channel model of interest. The binary 
special cases of interest are analyzed in Section HiH and the 
Gaussian special cases of interest are analyzed in Section Hvl 
Finally, Section |V] contains some conclusions and directions 
for future work. The proofs of the converses are deferred to 

'Throughout this work, symbol refers to a real symbol. 
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the Appendices. 

II. Multicast Channel Model 

The K-usei multicast channel of interest is defined as 
follows. 

Definition 1: A K-nsei discrete memory less multicast 
channel with random parameters consists of an input alphabet 
X, output alphabets ^1,^2-, ■ ■ ■ for receivers 1,2, . . . , K, 
respectively, and a state alphabet §. For a given state sequence 
s" = (si, S2, . . . , s„) such that e § and input a;" = 
{xi,X2, ■ ■ ■ ,Xn) such that Xi e X, the channel outputs are 
distributed according to 



YipiVii'V^i' ■ ■ ■yKi\Xi,Si) (1) 
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Fig. L Two-user memoryles.s. noiseless binary multicast channel with 
additive interference. The encoder maps message W into codeword X". The 
state takes the form of interference sequences S" and ■ Each channel 
output y^" = X" © 5^, where © denotes symbol-by-symbol modulo-two 
addition, is decoded to produce message estimate Wk- 



where y[? = {yki,yk2, ■ ■ ■ ,ykn), for all yki G ^fe, k = 
1,2, ... ,K. Moreover, p(s") — Y\iP{si). The particular 
realization s" is known noncausally to the transmitter before 
using the channel, but not to any of the K receivers. 

It is worth emphasizing that the above definition includes 
the case where the channel of User k is controlled by its 
own state s^. In such cases, the joint state is, with slight 
abuse of notation, s" = (s", S2 , • • . , s^)' s° ^^^"^ ~ 

P{SU, S2i, . . . , SKi)- 

The capacity of the channel of Definition [2 is defined as 
follows. 

Definition 2: A (2"^, 71) code consists of a message set 
W„ = {1,2,... 2"-^}, an encoder /„ : W„ x S" ^ X", and 
K decoders g^^n '■ ^ W„ for fc = 1, . . . , K. The rate R 
is achievable if there exists a sequence of codes such that for 
W uniformly distributed over W„ we have 

lim = lim Pr I M {.gfe,„(yfc") ^ I = 0. (2) 
U=i ) 
Note that the error probability in (|2j is averaged over all state 
sequences and messages. The capacity C is the supremum of 
achievable rates. 

In the remainder of the paper, we focus on special cases 
of the memoryless channel in Definition [J In particular, we 
focus on binary and Gaussian cases in which the state is an 
additive interference; for results on the memory with defects 
multicast channel, see, e.g., [14]. 

III. Noiseless Binary Case 

We first consider the noiseless binary special case of Def- 
inition [0 Specifically, the channel outputs F", Y2 , . . . , 
depend on the input X" and the states 81,82, 
cording to 



8-^ ac- 



•SI 



(3) 



where Xi,8ki S {0,1}, and denotes symbol-by-symbol 
modulo-two addition (i.e., exclusive-or). In (|3}, the memo- 
ryless case of interest corresponds to the requirement that 
the {8ii, 82i, . . ■ , 8Ki) for i = l,2,...,n form an i.i.d. 
sequence of X-tuples. In particular, for each i the variables 
{8ii, 821, . . ■ , 8Ki} may in general be statistically dependent, 
and do not need to be identically distributed. As a result, we 



express our results in terms of the properties of a generic K- 
tuple in this sequence, which we denote by {81, 82, ... , 8k)- 
Note that with only a single receiver {K = 1), the capacity 
is trivially 1 [bit per channel use],^ which is achieved by 
interference precancellation, i.e., by choosing X" = S'"0i?", 
so that F" — i?", where _B" is the bit representation for the 
message W . As we will now develop, when there are multiple 
receivers, capacity is generally less than this ideal single-user 
rate. 

A. The Case of K ^ 2 Receivers 

The case of two receivers, which is depicted in Fig. [2 
is the simplest nontrivial scenario since perfect interference 
precancellation is not possible simultaneously for both users. 

One lower bound on the two-user capacity corresponds to a 
time-sharing approach that precancels the interference of one 
of the receivers at a time, yielding a rate of R-js ~ 1/2. 
Another lower bound corresponds to ignoring the interference 
at the transmitter, i.e., treating each of the channels as a binary 
symmetric channel. This strategy yields a rate of Ris = 1 — 
max{_ff (5i), -ff (5*2)}. It turns out that the former bound is 
only tight when 81 and 5*2 are independent and 23(1/2), and 
the latter bound is only tight when both 81 and ^2 are 23(0)-'. 

A coding theorem for the channel is as follows. 

Theorem 1: The capacity of two-user noiseless, memory- 
less binary channel with additive interference is given by 



82 



(4) 



Proof: A converse is provided in Appendix |I] The 
achievability argument is detailed below: 

1) Select 2"^ codewords randomly according to an i.i.d. 
23(1/2) distribution in a codebook 6 of rate R strictly 
less than the capacity @. Denote these codewords as 
B"{1),B"{2),..., B"(2"^), so a message w is repre- 
sented by codeword B^{w). 

2) Select a sequence A" by flipping a fair coin for each 
symbol index (the realization of which is also known at 

-prom now on, except in the case of ambiguity, the units of "bits per 
channel use" will be omitted. 

^We use 25(g) to denote a Bernoulli random variable with parameter q i.e. 
Pr(5 = 1) = q, Pr(5 = 0) = 1 - q. 
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Fig. 2. Achievable rates for tlie two-user noiseless binary multicast channel 
with independent and identically distributed interferences, as a function of 
the strength of the interference. Capacity is indicated by the solid curve, 
time-sharing performance is indicated by the horizontal dashed line, and the 
performance of a system that ignores the side information is indicated by the 
downward sloping dashed curve. 



the decoders [26]). Select the set Ai of symbol indices 
where Ai — 1, and precancel the interference at those 
indices for user 1, and precancel the interference at 
the remaining indices A2 (with Ai = 0) for user 2. 
Specifically, the transmitted sequence is of the form 



B,{w) ® Sii 
B,{w) © S2i 



i e Ai 
i £ A2. 



(5) 



With this encoding, receiver 1 then observes a version of 
B"'{w) where symbols are correct, and the remaining 
\A2\ symbols are corrupted by interference Su ® S2i, i G A2, 
corresponding to a binary symmetric channel with crossover 
probability q' = PrlS"! ©5*2 = 1}. Receiver 2 experiences the 
opposite effect. Thus for large n we have, since |.Ai|/n 
1/2, 



1 



-(l-H{Si®S2)), A; -1,2, (6) 



which is C in ©. As the mutual information expression in (|6} 
indicates, the decoding of Y"^" to the message Wk is done by 
using the knowledge of Ai and A2 (i.e., A") at the decoders. 
In particular, receiver 1 selects a codeword which agrees with 
the received symbols in the set Ai and which is typical with 
noise Si © ^2 with the symbols in the set A2- For decoder 
2, the order of the sets is reversed. As long as i? < C, Wk 
equals W with high probability. ■ 

Fig. |2] shows the performance gains of optimal coding 
relative to time-sharing and disregarding the side-information. 
In particular, the achievable rate in the case of independent 
interferences is plotted as a function of the strength of the 
interference as measured hy q ~ PrlS*! = 1} = Pr{S'2 = !}• 

Three immediate conclusions can be drawn from Theo- 
rem ^ First, transmitter-only side information incurs a penalty 
relative to system-wide side information unless Si and S2 are 



completely dependent random variables, i.e., unless S2 — Si 
or S2 — Si. Second, time-sharing is strictly sub-optimal 
except when and S2 are independent 'B(l/2) random 
variables. We emphasize that, by contrast, when there are 
independent messages for each of the receivers in Fig. [l] 
time-sharing between the receivers is optimal and there is 
no loss in the capacity region with side information only 
at the transmitter Finally ignoring the side information at 
the transmitter is strictly suboptimal except when H{Si) — 
H(S2) = 0. 

We make a few additional observations. 

Some Further Remarks: 

1) The achievability argument can also be obtained via a 
different, more direct, but perhaps less intuitive route as 
follows. First note that a straightforward extension of 
the random binning argument for the single user case 
[10] shows that the following rate is achievable for the 
iC-user multicast channel with random parameters. 

Rk^ max {min/(J7;Yfc) -/(t/;5)}, (7) 

p(U\S),p{X\U,S) k ■ ' 

Here U is an auxiliary random variable (over some 
alphabet U) that satisfies the Markov constraint U <-> 
{X,S)^Yk forfc = l,2,...,if. 
For the two-user binary channel, the following choice of 
U yields the achievability of (0}. Let the alphabet of U 
be U = {*i,*2,^'3,*4}- 



U = A{^i{X(S Si) + *2 © Si)} 
+ A{^3 {X © S2) + *4 (^©^2)}, 



(8) 



where, X is 'B(l/2) random variable, independent of Si 
and 52, and A is also 'B(l/2) that is independent of X, 
Si and 5*2, and where ~ denotes the complement of a 
(binary- valued) variable. 

2) For the code construction outlined above suggests the 
transmitter does not require noncausal knowledge of 
the interference. We emphasize, however, this result is 
specific to the noiseless binary channel model. 

3) It is straightforward to verify that random linear codes 
are sufficient to achieve the capacity of Theorem ^ It 
suffices to use an argument analogous to that used by 
Gallager for the binary symmetric channel [9, Sec. 6.2]. 

4) Theorem ^ can be readily generalized to the case of 
state sequences that are not in general i.i.d. In this case 
the term H{Si © S2) in is simply replaced with the 
entropy rate of S" © S2- 

5) Our achievability scheme also applies in the presence of 
noise. For the channel model 

Yi^X®Si®Zi 

Y2=X®S2® Z2, 

where Zi and Z2 are mutually independent and iden- 
tically distributed Bernoulli random variables and inde- 
pendent of all other variables, we can show that a rate 



1 



1 



R=l- -H{Si © 52 © Zi) - -H{Zi) 
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is achievable and an upper bound is given by 

R+ = l 



® ^2) - Ih{Z^) 



Note that time-sharing is optimal in the special case 
when 5*1 and ^2 are independent 23(1/2) random vari- 
ables. 



B. The Case of K > 2 Receivers 

When there are more than two receivers further losses in 
capacity ensue, as we now develop. Specifically, we have the 
following bounds on capacity. 

Theorem 2: The capacity of the ii'-user noiseless binary 
channel in which the generic 5i , 6*2 , . . . , Sk are mutually 
independent and identically distributed'* is bounded according 
to: 

R- <C < R+, (9a) 



where 




R+ 




R- 


= max 



1 



H{Si e ^2 



(9c) 



Proof: The upper bound ( 19b > is established in Ap- 
pendix HI] The lower bound ( l9cl is obtained via a direct 
generalization of the code construction Q in the case of two 
users. Specifically, it suffices to consider a code construction 
that divides each codeword into K equally sized blocks and 
precancels the interference for a different user in each of the 
blocks. Each user then experiences one clean block and K —1 
noisy blocks governed by a binary symmetric channel with 
crossover probability q' = Pr{Si © ^2 = 1} as before. ■ 

In general, the lower and upper bounds in (|9} do not 
coincide.^ However, the associated rate gap decreases mono- 
tonically with the number of receivers K. Moreover, even for 
K = 3, it is small, as Fig. |3l illustrates. 

The rate gap also decays to zero in the limit of large K, 
which follows readily from Theorem |2l In particular, C — > 
1 — H{S) as X — > 00, where S denotes a generic random 
variable with the distribution of the Sk- To see this, it suffices 
to recognize that when ^i, iS'2, . . . , Sk are i.i.d.. 



1 



1 

K 



HiS) < ^H{S, : 
<HiS). 



S2, Si 



,Si®Sk) 



(10) 

As K ^ 00, the lower and upper bounds in ilO\ converge, 
so that the upper bound on capacity (|9b} converges to i?+ — 
1 — H{S). However, this rate is achievable by simply treating 
the interference as noise at the receivers, so it is the limiting 
capacity. It should be emphasized that this implies that when 

'^Our results actually hold more generally provided the distribu- 
tion across the interference sequences is symmetric, i.e., if for all 
m, p(sfei , Sfej , . . . , Sfc^ ) is independent of the specific choice of 
ki,k2,...,km G {1,2^..,K}. 

slightly improved lovifer boimd appears in [14], but it, too, does not 
match the upper bound. 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 

Interference Strength q = Pr{S|^=1} 

Fig. 3. Upper bound and lower bounds on the capacity of the three- 
user noiseless binary multicast channel, as a function of the strength of the 
interference. The solid curves depict the two bounds of |9j. The horizontal 
dashed line indicates the performance of time-sharing, while the other dashed 
curve indicates the performance of a strategy in which the side information 
is ignored by the transmitter. 



the number of receivers is large, the side-information available 
to the transmitter is essentially useless. 

We can also use ilO\ to bound the rate penalty associated 
with ignoring side information as a function of the number of 
receivers K. In particular, the gap is at most H{S)/K. 

Finally, we can use Theorem |2] to establish that in the limit 
of large interference, time-sharing is optimal for every K. 
Specifically, when Sk ~ 23(1/2), the capacity is C — \/K 
and is achieved through time-sharing. To see this, it suffices 
to specialize the upper bound in \9h\ . Specifically, Si(BSk for 
k = 2,3, . . . , K are independent 23(1/2) random variables, so 
the joint entropy is K — 1. 

IV. Gaussian Case 

In this section we consider a memoryless Gaussian ex- 
tension of Definition [2 and incorporate an average power 
constraint on the input. Unless otherwise stated, we restrict to 
the two-user (K = 2) case. In the scenario of interest, depicted 
in Fig. 0] the state is additive, and the associated interferences 
5^ are zero-mean white Gaussian sequences of power Q. 
We first focus on the case of independent interferences and 
consider the case of correlated interferences in section IIV-AI 
In addition, each receiver's link also has a zero-mean additive 
white Gaussian noise ZJ^ of power iV. Thus, the observation 
at receiver k takes the form 



= X" + S'^ + Zl!, 



1,2. 



Our power constraint takes the form 

n 

Y,xf{w,s^,s^) 



n 



(11) 



(12) 



where the expectation is taken over the ensemble of messages 
and interference sequences. Finally, note that without loss of 
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Fig. 4. Two-user Gaussian multicast channel model with additive in- 
terference. The encoder maps message W into codeword X" . The state 
takes the form of interference sequences S" and ■ Each channel output 
= X" + SJ^ + ZJ^ is decoded to produce message estimate Wk- ■ 
The interference and noise sequences are i.i.d. and mutually independent. 
Furthermore, 5i, S'2 ~ >r(0, Q) and Zi, Z2 ~ ^^{0, 1). 



generality, we may set N = 1, and interpret P as the signal- 
to-noise ratio (SNR), and Q as the interference-to-noise ratio 
(INR). 

For this channel, we present the following bounds on the 
capacity. 

Theorem 3: An upper bound on the Gaussian multicast 
channel capacity is : 



C < min{i??^,i?l^^}, 



where 



6 ,7 



ilog(l + P) 

4^0g I Q74+T 



ilog( 

^ 4 ^"6 I Q/4+1 J 



P+Q+1+2V7^ \ 



(13) 



Q>4 

Q < 4 
(14) 



ilog 



l+P+Q+2^/TQ \ 
l + Q/2 J 
l+P+Q+2^/PQ 
\/2Q 



Q 



2P+2 



g > 2 

(15) 

We have presented two different upper bounds denoted by 
R^_l and R\_ since neither bound dominates the other, over 
all values of {P,Q). The two bounds have been derived by 
slightly different methods. The bound i?^ is obtained by 
observing that the channel is non-trivial even if we set one 
of the interferences (say 5*1) to 0. Furthermore, it is possible 
to show that an upper bound on this modified channel is also 
an upper bound on the Gaussian multicast channel of interest. 
A complete derivation of this upper bound is presented in 
Appendix IIVI The expression for R^J is obtained by directly 
applying a chain of inequalities on the Gaussian multicast 
channel and its derivation is presented in Appendix IIIII 

We remark here that the upper bounds are explicit expres- 
sions of the following maximization: 



T 1 ^1 

R. = mm - log - 

+ pel-is] 4 V 1 



'All logarithms are to the base 2 in this work. Also the notation [/]+ refers 
to max(/, 0) in 1151 and throughout the paper 

'The trivial upper bound of ^ log{l + P) is sometimes tighter than these 
two bounds, pailicular in the limit of very small P. 



i?V = mill - loe; 
+ pe[-i.i]2 ^ 



- log , 



P + Q + 2/PQ + 1 
v/(l + p)(Q + i-p)^ 



Q 



(17) 



Theorem 4: A lower bound on the Gaussian multicast chan- 
nel capacity is : 



ilog (1 



Q/2+1 



i?_ = <; Ilog (£±^) + Ilog (f) 1<Q/2<P+1 
3log(l + P) 



Q/2 < 1 

1 < Q/2 
Q/2>P+1. 



(18) 
Proof: 

The lower bound^ ( I18t is an explicit expression of the 
following maximization: 



i?_ = max R(Pa,Pd) (19a) 

{{PA,PD).PA>0,PD>a,PA+PD<P} 



with 



1 



l0g(l+Pz5). 

(19b) 

Accordingly, we show the achievability of ( I19b> . The pro- 
posed scheme, combines superposition coding, dirty paper 
coding, and time-sharing, and exploits a representation of the 
interferences in the form 



^2 



A" 
A" 



Q <2 where 



D" = (5r - 5^^)/2. 



(20) 



(21) 




We list the main steps for codebook generation, encoding 
and decoding. The probability of error analysis will be omitted 
as it is based on standard typicality arguments. See e.g. [7]. 

Codebook Generation: The idea is to generate three code- 
books. There is one common codebook which both the users 
share and two private codebooks which are intended for the 
corresponding user More specifically we follow the following 
steps: 

1) Decompose the message W into two submessages Wa 
and Wd and divide the power P into two powers Pa and 
Pd so that P = Pa + Pd- Message Wa will be decoded 
by both the receivers while message Wd will be decoded 
by only one receiver at a time. We will transmit it twice 
so that both the receivers can decode (see encoding and 
decoding rules below for a further description). 

2) Generate a codebook Ga for Wa where the codewords 
are sampled from i.i.d. a Gaussian distribution 

Ua — Xa + uaA. Here Xa is Gaussian 'N{0,Pa), 
independent of A and a a = Pa/ (P + Qfi + I). A 
total of 2"^^'^-*'^'^ codewords are thus generated and 

^Our lower bound for Q/2 < 1 was also independently reported by 
Costa [5]. 
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randomly partitioned into 2"^^'^'*'"^) bins. The rate of 
this codebook, I{UA,Yi) — I(Ua'-,A) can be shown to 
be«: 

Ra = - log 1 1 + —, I . (22) 

2 ^\ Pd + Q/2+i) 

3) Generate two codebooks C^-* and for Wd for the 
two receivers as follows. For the codewords U]^ are 
sampled from a i.i.d. Gaussian distribution Ud = Xo + 
ctoii^ — ctA)A + D), where Xo is Gaussian ]V(0, Pd), 
independent of A and D and = Pd/{Pd + !)■ 
Generate 2"^('^"'^i''^'^) such codewords and partition 
them into 2"^('^"''^'^) bins. Follow analogous construc- 
tion for codebook The rate of each codebook'" 
I{Ud\ Yi, Ua) - I{Ud;A, D) can be shown to be: 

RD = \\og[l + PD). (23) 

Encoding: We transmit a superposition of two sequences 
corresponding to Wa and Wd as follows: 

1) To encode a message Wa, find a codeword [/JJ in the bin 
of Wa, such that X\ = — uaA"^ satisfies a power 
constraint of Pa- By construction, such a codeword 
exists with high probability. 

2) To encode Wd, we decide whether to send it to user 1 or 
2. The users are served alternately. When we decide to 
send it to user 1, we select a codeword U"^ in the bin of 
codebook C^"* corresponding to message Wd such that 

= f^B - "cKl - "a)^" + satisfies a power 
constraint of Pd- When we decide to transmit to user 2, 

("2") 

we select a codeword in the bin of codebook 6}^, 
corresponding to message Wd such that = ?7|^ — 
aD{(l — OiA)A" — £)"} satisfies the power constraint 
of Pd- Since there are 2"^('^0'^'^) codewords in each 
bin, such a codeword exists with high probability. 

3) Send the superposition X" = + X}^, which has 
power P, over the channel. 

Decoding: The decoding exploits successive cancellation 
(stripping) and proceeds as follows: 

1) Decode from or treating X"^ as part of the 
noise. The received signals are of the form 

Yj" = X2 + A'' + {D" + Z'l + XS) 

^Ul + [1- aA)A'' + (Z?" + + XI), 

^Xl + A" + (-£>" + Z'^ + XI) 

^Ul + il- aA)A'' + (-i?" + Z'^ + XI)- 

Since D" + Zf + is an i.i.d. Gaussian ^^(O, Pd + 
(5/2 + 1) sequence, independent of A", our choice of 
rate Ra in (I22> ensures that the resulting Wa equals 
Wa with high probability at both the receivers. 

'Using a symmetry argument or otherwise, note tliat /(C/yi;yi) = 
I(Ua\ ^2), so we use tlie generic term I{Ua\ Yi) to denote either of these. 

'"Notice that the codebooks can be the same for two users. For notational 
convenience while dealing with the two users we keep the codebooks separate 
since a codeword typical with Y" will not in general be typical with Y2 ■ 
See the encoding rules below. 
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Interference-to-Noise Ratio (dB) 

Fig. 5. Upper and lower bounds on the capacity of the two-user Gaussian 
multicast channel, as a function of INR Q for an SN R P = 33 dB . The 
upper two curves depict the two upper bounds from I15i and I14i . The 
marked line is the achievable rate in <18l . The horizontal dashed line indicates 
the performance of time-sharing, while the other dashed curve indicates the 
performance of a strategy in which the side information is treated by the 
transmitter as additional noise on each link. 



2) Subtract the decoded from each of F" and Y2, so 
that the residual signals Y^ — 1^" — t/^ are of the form 

fi" =Xl + ((1 - + Z?") + Zi", (24) 

KT =Xl + ((1 - aA)A^ - D") + ZS- (25) 

The rate Rd in ( I23t ensures that J7]^ can be decoded 
from either Y{^ or Y2'' so that the resulting Wd equals 
Wd with high probability at the corresponding receiver. 
Specifically, for the fraction of time that the transmit- 
ter encodes Wd for interference (1 — q:^)A" -f- Z?", 
user 1 can recover Wd, while for the fraction of 
time that the transmitter encodes Wd for interference 
(1 — — _D", user 2 can recover Wd- 

From this coding strategy, we see that the average rate 
delivered to each receiver is identical, i.e., Ra + {1/2)Rd- 
Maximizing this rate over the choices of Pa and Pd subject 
to the constraint P — Pa + Pd optimizes the lower bound, 
whence (I19a> . 

■ 

From ( I18> . we obtain several useful insights. First, note that 
in the high INR regime {Q/2 > P + 1), our lower bound 
reduces to time-sharing, while in the low INR regime {Q/2 < 
1) it reduces to dirty paper coding with respect to A". In 
the moderate interference regime, our bound shows that one 
can generally achieve a gain over these two strategies by a 
superposition coding approach that combines them. 

The behavior of the bounds as a function of INR is depicted 
in Fig. 13 for a fixed SNR of P = 33 dB. When the INR is 
very small {Q <C 1), Fig.|5lreflects the rather obvious fact that 
the side information can be ignored by the transmitter without 
sacrificing rate. Similarly, when the INR is large(Q ^ 1), 
Fig.|5]reflects that time-sharing between the two users achieves 
the capacity. More generally. 
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Fig. 6. Upper and lower bounds on the capacity of the two-user Gaussian 
multicast channel, as a function of SNR P for a n IN R Q = 15 dB. The upper 
tw o cu rves depict the two upper bounds in <151 and 1141 . The achievable rate 
in 1181 is also shown. The dashed curve indicates the peii'ormance of time- 
sharing, while the dash-dotted curve indicates the performance of a strategy 
in which the side information is treated by the transmitter as additional noise 
on each link. 



the limit of high SINR (P ^ oo,Q is fixed). For Q > 2 it can 
be expressed as C{P) — \ log (^-^^ +o(l), where o(l) 
as P ^ oo. For Q < 2 it can be expressed as C(P) = 
1+Q/2 ) + "(l)- Finally, for the case of fixed P and 
Q 00, time-sharing between the two users is optimal and 
the capacity can be expressed as C(P) = | log(l + P) +o(l), 
where o(l) ^ as Q ^ 00. 

Finally, we show in Appendix lIII-Bl that a universal constant 
that bounds the difference between our upper and lower 
bounds is given by: 

sup B}1 - i?_ = i log ( ^ + V2 ) = 0.7716 (30) 
P,Q 2 \2 / 

We conclude this section with a few additional observations. 
Some Further Remarks: 

1) Extension to K receivers: Our upper bounding technique 
for i?^ in ( I15> can be extended to the case of K 
receivers each with independent interference. We show 
in Appendix IIII-CI that the following upper bound holds 
for the case of K receivers: 



lim C < lim R\_ = lim R^_l = - log(l 



Q— >oo 



Q^oo 



P), (26) 



which can be achieved by time-sharing between the two users 
and doing Costa dirty paper coding for each user being served. 
We note that this result settles the conjecture made in [24]. 

Perhaps more interestingly, our proposed achievable rate 
is optimal in the limit of high SINR. The behavior of the 
bounds as a function of SNR is depicted in Fig. |6l for a 
fixed INR of Q = 15 dB. We note that the expression for 
i?^ coincides with i?_ in this limit. Note that the base-line 
schemes do not achieve a rate particularly close to capacity, but 
the superposition dirty paper coding strategy corresponding to 
our lower bound does. More generally, we can show that: 

lim (C - < lim (P" - P_) = (27) 

P— )-oo P^oo 

To verify ( I27> for Q > 2, since P oo, the middle case of 
the lower bound ( I18> applies which we can alternately express 
in the form 



1 



■log 



P + Q/2 + 1 



2 " V v2g 

Comparing i2S\ with the upper bound (I15> we have 



RV^ — R- 



(28) 




(29) 



which in the limit P ^ oo gives i27i . The case Q < 2, can 
be similarly verified. We summarize the optimality properties 
in the following corollary. 

Corollary 1: For the Gaussian multicast channel in Fig- 
ure 0] the proposed achievable rate in Theorem|4]is optimal in 



1 



log 

2K ^ 



K - 1 
" 2K 

Q 



■logQ 



K{P+l) 

(31) 

By taking the limit Q ^ oo in ( I31> . it can be sown that 
time-sharing is optimal for any number of users in the 
high INR Hmit. 

2) Correlation between noise sequences: The upper bound 
in Theorem |3] is valid even when the noises Z" and 

are not independent. The argument is analogous to 
that for the standard broadcast channel (e.g. [7, Ch. 
14]). We exploit this observation to derive the upper 
bound expressions. Furthermore analogous to the result 
in [4], even if the noise is not Gaussian our lower bound 
in ( I19a> is achievable when the decoder treats the noise 
as Gaussian. 

3) Feedback does not help much. As discussed in Ap- 
pendix |^^^ and |^^^ the expressions for R\ and R^l 
in M6\ and ( I17> continue to hold in the presence of 
perfect causal feedback, provided we do not optimize 
over the parameter p, but set it to equal the actual 
correlation between the noise terms. 

4) The capacity-achieving strategy for the binary channel 
does not extend immediately to the Gaussian channel. 
While one might speculate that an adaptation of the 
achievability approach in Theorem ^ for the Gaussian 
channel would improve on the lower bound ( I19at in The- 
orem|3 the obvious generalizations do not. In particular, 
strategies which precancel the interference in part of the 
codeword for each user achieved lower rates than our 
superposition dirty paper coding; for a further discussion 
see [14]. 
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A. Correlated Interferences and Robust Dirty Paper Coding 

Consider the a memoryless Gaussian point-to-point channel 
model with output 



(32) 



where X" is the channel input subject to power constraint P, 
S'" is a white Gaussian interference sequence of power Q not 
known to decoder, and Z" is a white Gaussian noise sequence 
of unit power. When the interference 5" is perfectly known to 
the encoder, Costa's dirty paper coding is capacity achieving. 
However, in many applications, only imperfect knowledge 
of S*" is available to the encoder. One special case is the 
case of causal knowledge considered by Shannon. Another 
is the case of noisy noncausai knowledge. For these kinds of 
generalizations, there is interest in understanding the capacity 
of such channels and the structure of the associated capacity- 
achieving codes, which we refer to as robust dirty paper codes. 

It is often natural to analyze such problems via their equiv- 
alent Gaussian multicast model. As an illustration, suppose 
that the interference in (I32> is of the form S*" = (38^ where 
~ ^(0, QI) is known to the encoder but (3 is not. Then if 
(3 is from a finite alphabet (or can be approximated as being 
so), i.e., (3 G {/3i, /32, • ■ ■ , /5k}, the problem is equivalent 
to a Gaussian multicast problem with K users where the 
interference for the fcth user is (3kS'^. 

From this example it is apparent that for at least some 
applications, there is a need to accommodate correlated in- 
terferences in the Gaussian multicast model. In what follows 
we focus on that case where there are two receivers i.e. 
(3 S {/3i,/32}- Extensions to the case of more than two 
receivers are possible, but will not be explored. 

We first provide a general upper bound for the case of 
correlated, jointly Gaussian interference sequences and then 
specialize it to the case of scaled interferences. The general 
upper bound might be of independent interest and is derived 
in Appendix fVl 

Theorem 5: Consider a two receiver channel model F/' = 
X" + Sf + for i = 1,2 when is i.i.d. 3V(0, 1) 
noise, 5'" and 5*2 are i.i.d. jointly Gaussian with marginal 
distributions 3\r(0, Qi) and ?^(0, Q2) respectively and suppose 
that the distribution of 5*1 — 52 is ?^(0, Qd)- An upper bound 
on the common message rate for this channel under a power 
constraint P at the transmitter is given by: 

2 1 

R%=Y,-\og{P + Q,^l + 2.JPQi)-T{Qd) (33) 



i=l 



where 



T{Q<i) 



^log(l + 



Qd<4 



(34) 



We note that the upper bound is of most interest in the high 
signal-to-interference-plus-noise limit i.e. when we fix Qi,Q2 
and take P — > 00. In this limit we have the following: 

Corollary 2: In the high SINR Hmit (Qi,Q2 fixed, P 
00), the upper bound on the case of correlated interferences 



in Theorem |5] can be written as 



i?5 = iiog(p)-r(gd) + o(i), (35) 

where the term o(l) approaches as P ^ 00 and Qi,Q2 
fixed and T{Qd) is given in ( l34l i. 

To establish an achievable rate, we will consider a mod- 
ification to our lower bound in Theorem |4] which considers 
the case of independent interferences. To deal with the case 
of coiTelated interferences, we will require that the encoder 
and decoders have access to a common source of randomness 
which will be used as a dither sequence. 

Consider a superposition dirty paper coding strategy anal- 
ogous to that in the proof of the lower bound in Theorem |3 
whereby we decompose the interferences according to ( 120^ In 
this case, we have that (I21> specializes to 



A" 



= Ha 



' 



where 



Pa = {f3i + (32)12 
pD = iPi - (32)12. 



(36) 



(37) 



When we turn to implement the encoding step in the proof 
of the lower bound of Theorem |3 in which A" is treated 
as interference and Z?" as noise, the results of [6] cannot 
be directly applied since the interferences A" and £>" in 
( 13 6> are correlated. On the other hand, if we assume that the 
encoder and decoder(s) have access to a source of common 
randomness in the form of a dither sequence, we can use the 
lattice coding strategy in [8]. In this scheme, the transmitted 
sequence is statistically independent of the interference and 
noise sequences. It can be easily shown that for such schemes, 
correlation between the interference and noise sequences does 
not change the achievable rate relative to the case when the 
noise and interference sequences are independent ' ' . With this 
scheme, we obtain the following lower bound. 

Theorem 6: An achievable rate for our example multicast 
channel with correlated interferences and common randomness 
at the encoder and decoders is given by: 

C^iP)> max R'^{Pa,Pd), 

{{Pa.Pd).Pa>0,Pd>Q.Pa+Pd<P} 

(38a) 

where 



R''iPA,PD) = i: 



1 



1 



Pa 



1 + Qd/4 + Pd 



(38b) 



log(l + Pi5), 



where Qd = {Pi — (32)^Q is the variance of Si — 82- 
Optimizing over Pa and Pp, gives the following achievable 
rate: 



P^(P) 



< 



1 



i/4 



1 log ( P+^+Q^/i 
Ulog(l + P), 



Qd<4 

4< Qd <4(P+1) 
> 4P + 4 



(39) 



"in fact, the result in [8] holds for an arbitrary interference sequence. 
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We note that in the limit of high SINR, our expression for 
R'^ in (|39|l is given by i?'! = ^log{P)~T{Qd) + o{l), where 
T{Qd) is given as in ( 134k This coincides with the upper bound 
in i35\ and thus establishes the optimality of our scheme in 
the high SINR Hmit. 

Corollary 3: The proposed achievable rate in Theorem |6l is 
optimal in the limit of high SINR (fixed Qi, Q2, P — > 00) i.e. 
limp^oo C'3(P)-i?^'(P) = 0. 

V. Concluding Remarks 

We introduced the multicast channel model and analyzed the 
special cases of binary and Gaussian channels with additive 
interference. Our main observation in this work is that unlike 
the single user case, the lack of side information at the receiver 
strongly limits capacity. We show that in both the binary and 
Gaussian cases if the interfering sequences are independent, 
time-sharing is optimal in the limit of large interference. Also 
certain achievable rates and their optimality properties have 
been discussed. The capacity has been established for the two 
user noiseless binary case and for the Gaussian case in the 
high signal-to-interference-plus-noise ratio limit. Somewhat 
surprisingly, the optimal schemes are very different for the 
two cases. 

It may be possible to extend the upper bounding techniques 
in this paper to more general channel models and perhaps 
also sharpen the results for the Gaussian and binary cases. 
We emphasize however that the proposed bounds indicate an 
important engineering insight that there is a significant loss in 
dealing with more than one interference sequence at the trans- 
mitter, even when they are correlated. An interesting direction 
of future work would be to investigate the connections of this 
result with a recent result on MIMO broadcast channel with 
imperfect channel state information at the transmitter [19], 
where again it was shown that lack of perfect CSI strongly 
Umits the broadcast channel capacity. 

Appendix I 
Proof of the Converse in Theorem[1] 

We have to show that for any sequence of (2"^,n) codes 
with P" 0, we must have R < C, where C is defined in 
0. 

Since each receiver is able to decode the message we have 
from Fano's inequality 



HiW\Y^)<nen, forfc = l,2, 



(40) 



where e„ is a sequence that approaches as 71 00. We can 
use Fano's inequality to bound the rate as 



nR = H{W) 

= H{W\Y{') + I{W;Y^') 
<nen + HiY{')-H{Y,^\W) 



(41) 



<nen + J2H{Yi,)-HiY^^\W) (42) 
<nen + n-H{Y{'\W), (43) 



where (14 H follows by using the Fano inequality ( I40> . i42i 
follows from the chain rule and the fact that conditioning 
reduces the entropy, and ( I43l l follows from the fact that each 
Yij is binary valued. We can similarly bound the rate on the 
second user's channel as 



nR < nen + n - HiY^\W). 
Combining ( I43> and (I44t . we obtain 

iR<n- max{H{Y^'\W), HiY^'^lW)} 
1{H{Y-\W) + H{Y-\W)}^ 

^H{Y[\Y^\W) + ne„ 
^H{Y[^(3Y^^\W) + ne^ 



(44) 



nR < 


n — I 


< 


n — - 


< 


n — - 


< 


n — - 




n — - 




n ( 1 



(45) 
(46) 
(47) 
(48) 



where ( 145 > follows from the fact that conditioning reduces 
entropy, ( I46> follows from the fact that Y^ is a 
deterministic function of (Y", Y^), ( I47> follows from the fact 
that Yi <3) Y2 ^ Si ® S2, and (^ follows from the fact that 
both Si and S2 are i.i.d. so the joint entropy of the sequence 
Si ^2 is the sum of the individual terms. 

Appendix II 
Proof of upper bound (|9b} in Theorem|2] 

The upper bound mirrors the converse for two-user case. In 
particular, following the same steps as in the two-user case to 
derive ( I45> . we have that any achievable rate satisfies 

nR<n~ ^H{Y^, Y^ , Y^\W) + ne„. (49) 

A 

Proceeding from ( I49> we obtain 



nR- 


■ ne„ 


< n 


1 






^ K 




1 


= n 






" K 




1 


= n 






~ K 




1 


— n 






" K 




1 




' K 




n 


— n 






^ K 




1 




~ K 


< n 


n 






~ 'k 



h{y:C. Yr e Y^\ . . . , Fi" ® Yi}\W) (50) 
H{X" ® S^, S'^ ®S^,...,S'^® S]i\W) 
H{S'l®S^,...,S'^®S'l^\W) 

® S'^IS'^ ®S^,... S^ © 5*^:, W) 
H{Si(BS2,...,Si®Sk) 

H{x" e s-nsT ® s^, ... ST ® s^, w) (5i) 



where ( 15 Ot follows from the fact that the mapping 
(Yi", Y^,...Y^)'^ (Yi", Y[' ®Y^,..., 17' ® ^2") is invert- 
ible, and (|5T) follows from the fact that , SJ , . . . S"^ are all 
i.i.d. and independent of W. 
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Appendix III 
Proof of Upper Bound ([15} in Theorem^] 

We now derive (I15> for i?^. We first note that the capacity 
of the channel only depends on the marginal distributions 
g-n^^-n) and ^(l^ |^", , S"]", S"? ) and not on the 
joint distribution p(Y]", Fj"!^") ^i' ^2)- Allowing correlation 
between the noise Zi and Z2 does not change capacity. 
Specifically, we have 

Lemma 1: Let P" be the probability of decoding error in 
(|2j- If Pe is bounded away from zero for a certain correlation 
between Zi and Z2 above then it is bounded away from zero 
for any other correlation between Zi and Z2. 

Proof: The argument is essentially the same as given in 
[7, Ch 14, Page 454]. We repeat it here for completeness. Let 
Pg^'" and Pe denote the error probabilities in decoding at 
receiver 1 and 2 respectively. We have 

Pi^"-Pr(gi(yny^W^) 
P2^" = Pr(.g2(>^2") ^ W) 



PI! - Pr 



, fc=l,2 



Next, note that 



max{Pi^", P2-"} < PJ' < Pgi'" + p2." , 



(52) 



where the left inequality in \52\ follows from the fact that by 
definition P" > P^^' " for k — 1,2, and the right inequality 
follows from the union bound. In turn, note that both Pi'" and 
Pg " do not depend on the correlation between Zi and Z2. 
Accordingly, both the left and right hand terms in (I52> do not 
depend on the correlation between Zi and Z2. In particular 
if P" is bounded away from for some correlation between 
Zi and Z2, then necessarily one of P^ " and Pg^ " is bounded 
away from zero. Thus the probability of error is bounded away 
from zero for all possible correlations. ■ 

In the rest of the section we will fix £^[^1^2] = p and 
derive an upper bound. Thereafter, we will optimize over p, to 
tighten the upper bound. We will need the following additional 
properties of Zi and Z2, which are readily computed. 

Lemma 2: Let Zi and Z2 be standard normal, jointly 
Gaussian random variables with correlation p. Define Z- = 
{Zi - Z2)/V2 and Z+ = {Zi + Z2)/V2. Then Z+ and Z_ 
are independent zero-mean Gaussian random variables with 
variances 1 + p and 1 — p, respectively. 

To obtain our upper bound we show that a sequence of 
(2"^,n) codes that can be decoded by both the receivers with 
P" must satisfy R < R^_l in ( 117k Note that our power 
constraint is of the form E[Xf] < Pi with X]r=i — 

Suppose Pi and P2 denote the rates at which the two 
receivers can reliably decode the common message. The rate 
of the common message must satisfy R < min(Pi, ^2)- 

From Fano's inequality, we have that for some sequence e„, 
which approaches as n ^ 00, 



We first upper bound Pi as 

nPi < /(l^;Yi") +ne„ 



< 



J2HY^-hiYr\W)+nen 



(54) 



" 1 ^ 

< ^ - log27re(P, + 1 + Q + 2y/P^) - /i(Fi"|W^) + 7ie„. 

1=1 

(55) 

< I log27re(P + 1 + + 2/PQ) - HY^^lW) + ne„, 

(56) 

where i54i follows from the chain rule and the fact that 
conditioning reduces entropy, and ( I55> follows from the fact 
that each Yi has a variance no larger than Pi + l + Q + 2\/PiQ 
and its differential entropy can be upper bounded by that of 
a Gaussian RV. Finally, (I56> is a consequence of Jensen's 
inequality. 

Similarly applying the above chain of inequaUties on User 
2, we have 

71P2 < - log2Tre{P+l + Q + 2^/PQ)-h{Y^\W)+ne^. 

(57) 

Now we can find an upper bound on the common informa- 
tion rate using i56\ and i51\ : 

TL 

nR = nmm{Ri,R2) < -(P1+P2) 

< - log27re(P + 1 + + 2v/P0) - -/i(ri"|W^) 
~\h{Y^\W)^-ne^ 

Ti I 1 

< - log27re(P+l + + 2/PQ)- -/i(yi",Kr|M^) + "en 

(58) 

where the last inequality ( I58t follows from the fact that 
conditioning reduces the differential entropy. 

We now need to lower bound h{Y{^ ^Y2\SV). In what 
follows we will also use the notation S*? — ^ r- and 

S —S 

S" — ^ ^ ^ . Note that 5*+ and S- are mutually independent, 
Gaussian 3V(0, Q). 

/i(yi",KjW) 
-hiXl^JL ^1" + ^2" 

= /i(S'!! + Z!!, \/2X" + S\ + Z\\W) 
= h{S'l + Z'HW) + h{V2X'' + SI + Zl\W, S'l 



W 



H{W\Y^) < ne. 



for fc = 1,2. 



(53) 



(59) 
(60) 

'71 ^ 

(61) 

= h{S'l + Z!!) + I{Sl; V2X" + SI + Zl\W, S'l + Zl) 
+ h{^/2X"■ + SI + Zl\W, SI + Zl,Sl) (62) 

> h{S1 + Z!!) + I{Sl; \/2X" + SI + Zl\W, SI + ZI) 
+ h{^/2X" + 5!^ + Zl\W, SI + Z1,S\, X") (63) 

= h{Sl + Z!!) + I{S\; V2X" + 5!^ + Z\\W, SI + Z!!) 
+ h{Zl) (64) 
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The above steps are justified as follows. In i59\ we have 
used the fact that the differential entropy is invariant to a 
transformation of unit determinant. We substitute for Yi and 
Y2 in i6Q\ . ( 16 U follows from the chain rule. In i62\ . we first 
drop the conditioning over W in the first term, since (S*", Z") 
are jointly independent of W and expand the second term. 
Finally j63> follows from the fact that conditioning on X" 
further reduces the differential entropy while (|64} is a conse- 
quence from Z'l being independent of {X", S'l, S":, Z'1,W). 

Since S^l, Z'^, Z^l are all i.i.d. Gaussian with powers Q, 
1 + p and 1 — p respectively, we have from (I64> 

/i(yi", Y^\W) > I{Sl; V2X" + SI + Zl\W, SI + ZD) 

+ - log 27re(g + 1 - p) + - log 27re(l + p) 

(65) 

It remains to lower bound the mutual information term 
in ( I65> . We first note that since S*" is independent of 
{Wt S'It Z":) one can drop the conditioning in the mutual 
information expression. 

Lemma 3: For each n > 1 and for any distribution 
p(X"|S'!!, W) such that J2Di ^i^f] < ^iP, The mutual 
information term in i65\ can be lower bounded as 



I{Sl ; + S'l + Z^W, S'l + ZD 

>i{SD,V2X'' + si + zD > 



(66) 



Proof: The left hand inequality follows immediately by 
expanding /(S'!f ; \/2X" + + Z^W, SI + ZD and using 
the fact that S'^ is independent of {S'l, Z1,W). 

The right-hand side is a consequence of the rate- 
distortion theorem for i.i.d. Gaussian sources. Note that 
^^Er=i(\/2X, + Z+,f] < n{2P +l + p). Thus if the right 
inequality were violated, for a certain distribution 
we could use it as a test channel in quantizing a n-dimensional 
i.i.d. Gaussian source and do better than the rate distortion 
bound. Alternately, note that 



i{sdV2X'' + si + ZD 
= h{SD ~ HSDV2X'- 



s'l 



z\ 



= /i(5!^)-/i(\/2X" + Z^|\/2X" + S'!^ + Z^) (67) 
> h{SX) - /i(V2X" + ZD (68) 



> /^(5!f)-^/l(V2X, + Z+,,;) 



Ti 



1 

J2^log{2P, 

i=l 



11 fl 

> -logQ--log(2P+l + p) 



log 



Q 



2P+1 



(69) 



(70) 



(71) 



(72) 



Here (|57Jl follows from the fact that h{X\Y) = h{Y - 
X\Y), (|68} from the fact that removing the conditioning on 
\/2X"- + 5" + only increases the differential entropy, ( l69l 
follows from the chain rule, (17 0> follows from the fact that 



the differential entropy with a fixed variance is maximized 
for a Gaussian distribution and M\\ follows from Jensen's 
inequality. This establishes ( I66> . ■ 
Finally, by substituting, ( I66> . ( I65> into (I58> . we get 



1 



i? < - log 



P + Q + 1 + 2VPQ 
V(Q + l-p)(l + p)^ 
Q 



— log , 

4 V 2F + 1 



(73) 



Finally, since p is a free parameter of choice, we can select 
it to be the value that minimizes M3\ and thus ( I17> follows. 
To obtain the tightest possible bound we can optimize over the 
value of p. We obtain M51 by selecting the following choice 
for p: 



P*{Q) 



Q/2 ifQ<2 
1 if Q > 2. 



(74) 



A. Gains from Feedback 

In the presence of feedback, the transmitted symbol at time 
i depends on the past output i.e. Xi — f{w,y\~^,y2~^,sD- 
In this situation is still independent of (W, Z", 5", Xl). 
This condition suffices, for deriving the bounds in ( I58t . ( I65t 
and ibbi . Lemma ^ does not hold however, since now the 
joint distribution between noise sequences does matter in the 
probability of error. So while the expression (I73t holds, one 
cannot optimize over p, but must select the value to be the 
actual correlation coefficient in the channel. 



B. Universal Gap between Upper and Lower Bounds 



In this section we verify ( I30t . the gap between upper and 
lower bounds for all values of P and Q. We consider three 
different cases. 

For Q < 2, we have 



1 



log 



P + l + Q/2 



(75) 



It can be verified that the maximum for P > and < Q < 2 
occurs for (5 = 2 and P = 1/4(9— vTz). The maximum value 
is l/21og((5 + \/T7)/4) w 0.5947. 

For the case 2 < Q < 2{P + 1) the difference is also 
given by M5\ . The supremum is attained when we set Q = 
2(P+1) and let P ^ 00. The supremum value is 1/2 log((3 + 
2V2)/2) « 0.7716. 

Finally for the case Q > 2(P + 1), the difference between 
the bounds is given by 



1, fP 
— log 
2 ^ 



Q + 1 + 2V7^ \ 

Q ) 



The supremum is obtained by taking Q = 2(P+ 1) and letting 
P ^ 00 and again equals l/21og((3 + 2V2)/2). 
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C. The case of K receivers 

We consider the case where there are K receivers. To get an 
upper bound, we assume perfect correlation between the noise 
sequences i.e. receiver k = 1,2, ... K gets YjT' = X" + SJ! + 
Z", where the interferences S'j^ are mutually independent and 
i.i.d. 3^(0, Q) and Z" is i.i.d. 3^(0, 1). 

To upper bound the common rate for the case of K 
receivers, first note that the derivation that leads to ( I58> can 
be straightforwardly generalized to yield 



nR < -log27re(P 




(76) 



^-h{Y-,Y^,. 

We now consider generalizing our derivation for i65\ 
to lower bound h{Y" , . . .Y^\W). Let us consider a 
set of K orthogonal vectors vi, V2, . . . vj^ , where vi = 
-^[1, 1, . . . , 1] and V2,...vx are arbitrarily chosen. Let 
Y" = {YC, ¥{",..., Y^) denote the X-tuple of received 
sequences. 

Claim 1: The component- wise inner product of Y" with 
Vi, . . . , -vk satisfies: 



(Y", vi) = VkX'' + VkZ'' + 
(Y",Vj)=T" for j = 2, 3,... if. 



(77) 



Where T^,T2, ■ . .TjJ are mutually independent, i.i.d. Gaus- 
sian 3Nf(0, Q) sequences. 

Proof: The expression for (Y" , Vi) can be verified by 
direct substitution. Here = + S'^ + . . . + 5^). 

Since Vj and vi are mutually orthogonal for j > 2, we have 
Eti^j^ = 0- Hence (Y",Vj) = Eti "j^-^f - We denote 



Tj' = X^iLi ''^ji^T- Since the S*" are mutually independent 
and i.i.d. and Vj are mutually orthogonal it follows that Tj^ 
are all mutually independent and i.i.d. 3^(0, Q). ■ 
We can now lower bound h{Y{\ Y2 , • . • ^^1^) ^ manner 
analogous to the derivation in \65\ . 



h{Y^,Y^,. 
= M(Y?,vi) 
= hiVKX"" - 
= h(T^) + . . 
+ h{\^X 
n{K- 1 



■YJ^\W) 

(Y2",V2), 

+ 



log 27re(5 



+ h{VKX'' 
n{K-l 



log 2TTeQ 



...{Y^,^k)\W) (78) 
T^,T^\...T^\W) (79) 

+ T^\T^,...,T^,W) (80) 



rr|w^,{T;}f=2) (81) 



> 



> 



+ h{VKX" + VKZ 

+ I(T^; VkX'' + 
n{K - 1 



T-\W,{T-}f^,) 

KZ" + ti'\t;' 



Tk.W) (82) 



2 

+ /(Ti" 
n{K- 1 



log 2TieQ 
KX'' - 

log 2TTeQ - 



— log 2'KeK 
2 ^ 



KZ'^ + T^IT^. 



— log 2neK 
2 ^ 



2^°nW 



(83) 

Q 
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^2 




W2 






Decoder 2 










yn 


Decoder 1 


Wi 







Fig. 7. Two-user Gaussian Channel with one-interference sequences. We 
derive upper bound on the capacity of this channel and show that this is also 
an upper bound for the two-interference channel in Fig.|3 Here only receiver 
2 experiences additive white Gaussian interference of variance Q. 



The justification for the above steps is as follows. In (m 
we have use the fact that the differential entropy is invariant to 
a rotation, while fI9i follows from Claim ^ In ( I80> and ( 18 U 
we have used the fact that T" are mutually independent, i.i.d. 
and independent of W. Eq. ( I83t follows by additionally con- 
ditioning the entropy term in ( I82> with Jf" and using the fact 
that Z" is independent of {W, X",T{', . . . T^). Finally ^ 
follows from fact that since is independent of {Tj^ljlz and 
W we can use an argument analogous to that in Lemma |3l 



to have 

I log 
obtain 



/(Tf; 
_Q 



KX" 



KZ' 



Tl'\TS 



Finally, substituting j84t in ilbi 



we 



(84) 



Appendix IV 
Proof of Upper Bound ([l4j in Theorem|3] 

Our proof is structured as follows. We derive an upper 
bound for a particular single-interference Gaussian channel, 
and reason that the capacity of the two-interference channel 
of interest in Theorem |3] cannot be higher. 

As shown in Figure the single-interference channel is 
one in which 5" = and ^2 = S*". Only the second receiver 
experiences interference. 

The subsequent two Lemmas establish that an upper bound 
on the capacity of the single interference channel is also an 
upper bound on the capacity of the two-interference channel 
in Figure |4] 

Lemma 4: Suppose that for the single interference channel 
model in Figure the encoder and decoder I have access 
to a source of common randomness 9, which is independent 
of the message W and {S, Zi, Z2). Then the capacity of the 
single interference Gaussian channel is at-least as large as the 
channel with two independent interferences in Figure |3 

Proof: The proof follows by observing that using the 
source of common randomness Q, we can generate an i.i.d. 
Gaussian 3Nf(0, Q) sequence S'g, for any value of n. This 
sequence is independent of all other channel parameters and 
is known to both the encoder and decoder 1. It is used to 
simulate the two independent interference channel as follows. 
Decoder 1, simply adds this sequence to the received output, 
and ignores its knowledge in decoding. The encoder has to deal 
with two sequences (S'g,S'"), both i.i.d. Gaussian 3\f(0, Q). 
With this transformation, any coding scheme for the two 
i iterference channel in Figure @]can be used over this channel 
with arbitrarily small probability of error. ■ 



13 



Lemma 5: A source of common randomness Q, which is 
independent of the message W and the channel parameters 
{S, Zi, Z2) cannot increase the capacity of the single interfer- 
ence channel in Figure 

Proof: Our proof is analogous to the proof that common 
randomness does not increase the capacity in the single-user 
case in [8]. We argue that for any sequence of codes, given 
a stochastic encoder and decoder that depends on the shared 
random variable 8, there exists a deterministic encoder and 
decoder with a smaller probability of error. 

Given the message m and state sequence s", and a realiza- 
tion 9 of the shared random variable, the encoding function 
(c.f. Definition be given by a;" = f{m,s",9). Similarly 
the decoding functions are given by = gkiUkj^) for 
k = 1,2, . . . , K. The average probability of error for the rate 
R randomized code is then defined by 



Our bound for R follows the derivation analogous to that 
for ( 15 8> and is given by 



= Ep, 



n=l 



E 5]Ks")p(2/"|/(m,s",0)) 



m=l y":3fe:c,j^(y^,e)5^m s 



6 = 6* 



where the second equality follows by interchanging the expec- 
tation and summation over m, and the third equality follows 
by observing that given a realization of the random variable 
0, the encoding and decoding are both deterministic and we 
can use the definition of the average probability of error in 
Finally note that there must be some value of 6 for which the 
term inside the expectation is minimized. We can design the 
encoding and decoding function for this deterministic value of 
and our probability of error will be lower than the average. 
Thus having access to common randomness cannot decrease 
the probability of error for the channel of interest. ■ 

Lemma 0] and |5] imply that an upper bound on the capacity 
of the single interference channel in Figure is also an 
upper bound on the two independent-interference channel in 
Figure |3 So we will derive an upper bound for the former 

Invoking the result of Lemma ^ we can let E[ZiZ2] = p, 
where p S [—1,1] will be optimized later As in the previous 
Appendix define Z_ = (Zi - Z2)/V2 and Z+ = {Zi + 
Z2)/V2. 

Suppose Ri and R2 denote the rates at which the two 
receivers can reliably decode the common message. The rate of 
the common message must satisfy R < min(i?i, i?2). Similar 
to our derivation in Appendix IIIII we use Fano's inequality to 
bound i?i and R2 as 

nRi < -log27re(F+ 1) - /i(Y]"|iy) + ne„, (85) 
nR2 < - log2Tre{P+l + Q + 2./PQ)~h{Y^\W) + nen. 



(87) 



ni? < - log 2ne{P +1 + Q + 2y/PQ) 

n 1 
+ - log27re(P + 1) - -HY,^,Y2"\W) + 2n£„ 

It remains to lower bound the joint-entropy term in ( I87> . 



= h 



= h - 




>h - 



= h - 



= 2log2.e - + 1 



-log27re(l + p) 



In the above steps, ( I88t follows from the fact that differential 
transformation is invariant under a pure rotation, ( I89t follows 
from the fact that the pair (5" , Z" ) is independent of W and 
conditioning on additional terms only reduces the second term, 
while ( I90> is follows from the fact that Z" is independent of 
all other variables in the second term. 

Substituting ( 19 1> into ( I87t and rearranging, we get 



i?<ilog ^ 



P 



1 + P 
P + Q + l + 2^/PQ 



(92) 



Q/2 + 1-P 

Thus we have shown the expression for il6\ . To obtain the 
tightest bound we minimize the right hand side of the above 
over p. The tightest bounds is obtained with the choice 



P*iQ) = 



Q/4 if Q < 4 



J if Q > 4. 
Substituting this value of p, in (|93 yields ([hJ. 



(93) 



(86) 



A. Gains from Feedback 

As noted in Appendix IIII- Al in the presence of causal feed- 
back it still holds that Z+., is independent of {W, Z!! ,S'\X{). 
It can be verified that with this condition, the derivation that 
leads to ( 19 11 1 continues to hold and the upper bound in \92\ 
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remains valid. One cannot however optimize over p in the 
presence of feedback as Lemma^fails to hold in the presence 
of feedback. 



Appendix V 
Case of Correlated Interferences 

In this section, we present the derivation of the upper bound 
in Theorem |5l The derivation is a minor modification of the 
derivation for the case of independent interferences. So only 
the steps that need to be modified will be presented. As in the 
statement of the Theorem, we assume that 5*1 ~ 7^(0, Qi), 
52 - J<{0, Q2) and 5i - 52 - :N(0, Qd). 

We first note that using Fano's inequality and the steps that 
lead to (I58> in Appendix IIIII an upper bound on the common 
rate can be shown to be 



nR<\h{Yn + \h{Y^) 



\h{Y-,Y-\W) 



(94) 



Using the power constraint, we upper bound h{Y^) < 
f log 27re(P + + 1 + 2^7^) for i = 1, 2. It remains to 
lower bound the joint entropy term. In what follows, we denote 



and Z" 



zi 



Z'\ 



Note that Z\ and Z!! are 



mutually independent and i.i.d. samples from 3\f(0, (1 + p)/2) 
and M(0, 2(1 — p)) respectively. 



h 



Yj;- 



h{Y^,Y:^\W) 
^h[si - S'i + Z1,X'' 
= h{S'^ - S'i + Z1) + h\ X 



r " + Y^ 



SI 



2 

^2" 



\w 



(95) 



zi\w 



51' + 5? 



> hiS'i - + Z'l) + h{Zl) 

= - log 2ne{Qd + 2(1 - p)) + - log 2Tie 



Zl\W 



(96) 
(97) 



Here \95\ follows from the fact that the transformation 
1/2 1/2 ^^^^ determinant and the differential en- 

tropy is invariant to this transformation, \96\ from the fact 
that 5{' - + Z": is independent of W and (|97} from the 
fact that is independent of all other variables. The optimal 
value of p, which yields the largest value for the lower bound 
is given by p* — min(l,Qc;/4) and the corresponding lower 
bound is given by: 



h{Y^,Y^)> 



(98) 



rilog(27re)2 (^1 + Sij if < 4 
log(2^e)2Qd ifQd>4. 

Finally substituting (I98> in ( I94> gives us the expression in ( I33> 
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